feat(ci): add ephemeral Brev E2E test infrastructure#813
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a Brev-based end-to-end testing flow: a new GitHub Actions workflow to provision ephemeral Brev instances, a Vitest E2E project that rsyncs and runs remote tests over SSH with injected secrets, an accompanying Vitest test file, minor shell test script tweaks, and a gitignore update. Changes
Sequence DiagramsequenceDiagram
participant GHA as GitHub Actions
participant Brev as Brev CLI
participant Instance as Brev Instance
participant SSH as SSH
participant Suite as Vitest Suite
GHA->>Brev: create instance (CPU, detached)
Brev-->>Instance: provision instance
GHA->>Instance: wait for IP / readiness
GHA->>SSH: poll SSH ("ok") with retries & brev refresh
SSH-->>GHA: SSH reachable
GHA->>SSH: rsync repo (excludes)
SSH-->>Instance: transfer files
GHA->>SSH: run remote bootstrap with exported secrets
SSH-->>Instance: execute setup script
GHA->>Suite: invoke Vitest remotely
Suite->>Instance: run remote test scripts
Instance-->>Suite: return test results (PASS/FAIL)
GHA->>Brev: destroy instance
Brev-->>Instance: terminate
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~38 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
.github/workflows/hardware-e2e.yaml (1)
50-52: Static concurrency group may cancel unrelated PR runs.The group
hardware-e2eis shared across all workflow runs. If PR#1triggers a run and PR#2triggers another, the first gets cancelled. This may be intentional to limit Brev resource usage, but could cause frustration in high-traffic scenarios.Consider including context in the group name if parallel runs for different PRs are desired:
♻️ Optional: Isolate concurrency by PR
concurrency: - group: hardware-e2e + group: hardware-e2e-${{ inputs.pr_number || github.run_id }} cancel-in-progress: true🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/hardware-e2e.yaml around lines 50 - 52, The concurrency group "hardware-e2e" is static and will cancel unrelated PR runs; change the concurrency.group value to include a run-specific identifier (e.g., PR number or ref) so runs are isolated per PR or branch. Update the concurrency.group (the key shown as "group: hardware-e2e") to an expression like hardware-e2e-${{ github.event.pull_request.number || github.ref_name }} (or use github.run_id for full uniqueness) so each PR/branch gets its own concurrency group and unrelated runs won't cancel each other.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/hardware-e2e.yaml:
- Around line 35-38: The input declaration for test_suite currently has both
required: true and a default ("full") which makes the default unused; fix by
deciding one behavior: either remove required: true so the workflow_call input
test_suite can be omitted and the default "full" will apply, or remove the
default value and keep required: true so callers must always supply test_suite;
update the test_suite input block accordingly (the input named test_suite in the
workflow_call inputs).
In `@scripts/e2e-brev-test.sh`:
- Around line 111-123: The run_test function currently interpolates
NVIDIA_API_KEY directly into the remote command string (in run_test), exposing
the secret; change the implementation to stop embedding ${NVIDIA_API_KEY} in the
SSH command and instead transmit it securely to the remote side (for example,
send the key over SSH stdin and set it on the remote shell before running the
test, or enable SSH env forwarding with SendEnv/AcceptEnv and use NVIDIA_API_KEY
from the remote environment). Update the ssh invocation in run_test so the
remote shell receives and exports NVIDIA_API_KEY without it appearing in the
logged command string, while preserving the rest of the behavior (exports for
npm_config_prefix, PATH, NEMOCLAW_* and the tee/PIPESTATUS handling).
- Around line 100-105: The SSH invocation in scripts/e2e-brev-test.sh currently
interpolates NVIDIA_API_KEY and GITHUB_TOKEN into the remote command string (the
ssh call that cd's into $REMOTE_DIR and runs bash scripts/brev-setup.sh),
exposing secrets in process listings; change this to pass secrets via stdin or a
remote temp file instead of embedding them in the command: send a here-doc or
pipe that sets/exports NVIDIA_API_KEY and GITHUB_TOKEN (and any NEMOCLAW_* vars)
and then invokes scripts/brev-setup.sh on the remote, or alternatively write
them to a secured temp file on the remote and source it before running
brev-setup.sh; ensure you remove the temp file afterward and avoid interpolating
the secret variables directly into the ssh command string.
---
Nitpick comments:
In @.github/workflows/hardware-e2e.yaml:
- Around line 50-52: The concurrency group "hardware-e2e" is static and will
cancel unrelated PR runs; change the concurrency.group value to include a
run-specific identifier (e.g., PR number or ref) so runs are isolated per PR or
branch. Update the concurrency.group (the key shown as "group: hardware-e2e") to
an expression like hardware-e2e-${{ github.event.pull_request.number ||
github.ref_name }} (or use github.run_id for full uniqueness) so each PR/branch
gets its own concurrency group and unrelated runs won't cancel each other.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 1f81ef1a-cc14-4dd4-9dcf-4f6a3544ccbd
📒 Files selected for processing (3)
.github/workflows/hardware-e2e.yaml.gitignorescripts/e2e-brev-test.sh
scripts/e2e-brev-test.sh
Outdated
| ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd $REMOTE_DIR && \ | ||
| export NVIDIA_API_KEY='${NVIDIA_API_KEY}' && \ | ||
| export GITHUB_TOKEN='${GITHUB_TOKEN}' && \ | ||
| export NEMOCLAW_NON_INTERACTIVE=1 && \ | ||
| export NEMOCLAW_SANDBOX_NAME=e2e-test && \ | ||
| bash scripts/brev-setup.sh" 2>&1 | tee -a "$LOG_FILE" |
There was a problem hiding this comment.
Secrets embedded in SSH command string are visible in process listings.
The NVIDIA_API_KEY and GITHUB_TOKEN values are interpolated into the SSH command line. This exposes them in:
- Local process listings (
ps aux) during execution - Potentially shell history or debug logs on the remote
Consider passing secrets more securely, for example via stdin:
🔒 Proposed fix to avoid secrets in command line
-ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd $REMOTE_DIR && \
- export NVIDIA_API_KEY='${NVIDIA_API_KEY}' && \
- export GITHUB_TOKEN='${GITHUB_TOKEN}' && \
- export NEMOCLAW_NON_INTERACTIVE=1 && \
- export NEMOCLAW_SANDBOX_NAME=e2e-test && \
- bash scripts/brev-setup.sh" 2>&1 | tee -a "$LOG_FILE"
+ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd $REMOTE_DIR && \
+ export NVIDIA_API_KEY=\"\$(cat)\" && \
+ export GITHUB_TOKEN='${GITHUB_TOKEN}' && \
+ export NEMOCLAW_NON_INTERACTIVE=1 && \
+ export NEMOCLAW_SANDBOX_NAME=e2e-test && \
+ bash scripts/brev-setup.sh" <<< "$NVIDIA_API_KEY" 2>&1 | tee -a "$LOG_FILE"Alternatively, write secrets to a temporary file on the remote and source it, or use ssh -o SendEnv= if the remote sshd is configured to accept environment variables.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd $REMOTE_DIR && \ | |
| export NVIDIA_API_KEY='${NVIDIA_API_KEY}' && \ | |
| export GITHUB_TOKEN='${GITHUB_TOKEN}' && \ | |
| export NEMOCLAW_NON_INTERACTIVE=1 && \ | |
| export NEMOCLAW_SANDBOX_NAME=e2e-test && \ | |
| bash scripts/brev-setup.sh" 2>&1 | tee -a "$LOG_FILE" | |
| ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd $REMOTE_DIR && \ | |
| export NVIDIA_API_KEY=\"\$(cat)\" && \ | |
| export GITHUB_TOKEN='${GITHUB_TOKEN}' && \ | |
| export NEMOCLAW_NON_INTERACTIVE=1 && \ | |
| export NEMOCLAW_SANDBOX_NAME=e2e-test && \ | |
| bash scripts/brev-setup.sh" <<< "$NVIDIA_API_KEY" 2>&1 | tee -a "$LOG_FILE" |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/e2e-brev-test.sh` around lines 100 - 105, The SSH invocation in
scripts/e2e-brev-test.sh currently interpolates NVIDIA_API_KEY and GITHUB_TOKEN
into the remote command string (the ssh call that cd's into $REMOTE_DIR and runs
bash scripts/brev-setup.sh), exposing secrets in process listings; change this
to pass secrets via stdin or a remote temp file instead of embedding them in the
command: send a here-doc or pipe that sets/exports NVIDIA_API_KEY and
GITHUB_TOKEN (and any NEMOCLAW_* vars) and then invokes scripts/brev-setup.sh on
the remote, or alternatively write them to a secured temp file on the remote and
source it before running brev-setup.sh; ensure you remove the temp file
afterward and avoid interpolating the secret variables directly into the ssh
command string.
scripts/e2e-brev-test.sh
Outdated
| run_test() { | ||
| local test_script="$1" | ||
| local test_name="$2" | ||
| info "Running $test_name..." | ||
| ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd $REMOTE_DIR && \ | ||
| export npm_config_prefix=\$HOME/.local && \ | ||
| export PATH=\$HOME/.local/bin:\$PATH && \ | ||
| export NVIDIA_API_KEY='${NVIDIA_API_KEY}' && \ | ||
| export NEMOCLAW_NON_INTERACTIVE=1 && \ | ||
| export NEMOCLAW_SANDBOX_NAME=e2e-test && \ | ||
| bash $test_script" 2>&1 | tee -a "$LOG_FILE" | ||
| return "${PIPESTATUS[0]}" | ||
| } |
There was a problem hiding this comment.
Same secret exposure issue in run_test function.
Line 118 embeds NVIDIA_API_KEY in the command string, same concern as the bootstrap step. Apply the same mitigation strategy here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/e2e-brev-test.sh` around lines 111 - 123, The run_test function
currently interpolates NVIDIA_API_KEY directly into the remote command string
(in run_test), exposing the secret; change the implementation to stop embedding
${NVIDIA_API_KEY} in the SSH command and instead transmit it securely to the
remote side (for example, send the key over SSH stdin and set it on the remote
shell before running the test, or enable SSH env forwarding with
SendEnv/AcceptEnv and use NVIDIA_API_KEY from the remote environment). Update
the ssh invocation in run_test so the remote shell receives and exports
NVIDIA_API_KEY without it appearing in the logged command string, while
preserving the rest of the behavior (exports for npm_config_prefix, PATH,
NEMOCLAW_* and the tee/PIPESTATUS handling).
scripts/e2e-brev-test.sh
Outdated
There was a problem hiding this comment.
Why build a test runner from scratch? We should write this as vitest and get assertions, reporting, async behavior etc for free.
There was a problem hiding this comment.
fair point. putting that in
8aede10 to
2f9ba13
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
scripts/e2e-brev-test.sh (1)
103-108:⚠️ Potential issue | 🟠 MajorStop embedding secrets in remote command strings (still exposed).
NVIDIA_API_KEY/GITHUB_TOKENare still interpolated into SSH command arguments. Pass them via stdin (or remote temp file) instead.🔒 Proposed fix pattern for both SSH calls
ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd $REMOTE_DIR && \ - export NVIDIA_API_KEY='${NVIDIA_API_KEY}' && \ - export GITHUB_TOKEN='${GITHUB_TOKEN}' && \ - export NEMOCLAW_NON_INTERACTIVE=1 && \ - export NEMOCLAW_SANDBOX_NAME=e2e-test && \ - bash scripts/brev-setup.sh" 2>&1 | tee -a "$LOG_FILE" + :" # keep structure for context + +printf '%s\n%s\n' "$NVIDIA_API_KEY" "$GITHUB_TOKEN" | \ +ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd '$REMOTE_DIR' && \ + IFS= read -r NVIDIA_API_KEY && \ + IFS= read -r GITHUB_TOKEN && \ + export NVIDIA_API_KEY GITHUB_TOKEN && \ + export NEMOCLAW_NON_INTERACTIVE=1 && \ + export NEMOCLAW_SANDBOX_NAME=e2e-test && \ + bash scripts/brev-setup.sh" 2>&1 | tee -a "$LOG_FILE" @@ ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd $REMOTE_DIR && \ export npm_config_prefix=\$HOME/.local && \ export PATH=\$HOME/.local/bin:\$PATH && \ - export NVIDIA_API_KEY='${NVIDIA_API_KEY}' && \ - export NEMOCLAW_NON_INTERACTIVE=1 && \ - export NEMOCLAW_SANDBOX_NAME=e2e-test && \ - bash $test_script" 2>&1 | tee -a "$LOG_FILE" + :" # keep structure for context + printf '%s\n' "$NVIDIA_API_KEY" | \ + ssh -o LogLevel=ERROR "$INSTANCE_NAME" "cd '$REMOTE_DIR' && \ + IFS= read -r NVIDIA_API_KEY && \ + export npm_config_prefix=\$HOME/.local && \ + export PATH=\$HOME/.local/bin:\$PATH && \ + export NVIDIA_API_KEY && \ + export NEMOCLAW_NON_INTERACTIVE=1 && \ + export NEMOCLAW_SANDBOX_NAME=e2e-test && \ + bash '$test_script'" 2>&1 | tee -a "$LOG_FILE"Also applies to: 118-124
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/e2e-brev-test.sh` around lines 103 - 108, The SSH invocation in the e2e-brev-test.sh script currently interpolates secrets (NVIDIA_API_KEY and GITHUB_TOKEN) into the remote command string; change the ssh call that runs "cd $REMOTE_DIR && ... bash scripts/brev-setup.sh" so secrets are not embedded: send NVIDIA_API_KEY and GITHUB_TOKEN via stdin or write them to a secure remote temp file before invoking bash, then source/read them in the remote environment (keep exports for NEMOCLAW_* as-is), and ensure the remote temp file is removed; update both occurrences of the ssh command (the block using INSTANCE_NAME, REMOTE_DIR, and LOG_FILE around lines shown and the second SSH at 118-124) so no secret values are interpolated into the command string.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/e2e-brev-test.sh`:
- Around line 46-49: The cleanup() function unconditionally calls brev delete
"$INSTANCE_NAME", which can remove a pre-existing instance if creation/login
failed; add a boolean flag (e.g., INSTANCE_CREATED or CREATED_BY_RUN) that is
set to true only after successful creation in the create/login flow, then modify
cleanup() to only call brev delete when that flag is true; apply the same guard
to the other deletion call referenced around lines 67-68 so deletions only occur
for instances created by this run and not for pre-existing resources.
---
Duplicate comments:
In `@scripts/e2e-brev-test.sh`:
- Around line 103-108: The SSH invocation in the e2e-brev-test.sh script
currently interpolates secrets (NVIDIA_API_KEY and GITHUB_TOKEN) into the remote
command string; change the ssh call that runs "cd $REMOTE_DIR && ... bash
scripts/brev-setup.sh" so secrets are not embedded: send NVIDIA_API_KEY and
GITHUB_TOKEN via stdin or write them to a secure remote temp file before
invoking bash, then source/read them in the remote environment (keep exports for
NEMOCLAW_* as-is), and ensure the remote temp file is removed; update both
occurrences of the ssh command (the block using INSTANCE_NAME, REMOTE_DIR, and
LOG_FILE around lines shown and the second SSH at 118-124) so no secret values
are interpolated into the command string.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 3b9d7093-3ee9-40bf-8f2b-c129a308733f
📒 Files selected for processing (2)
scripts/e2e-brev-test.shtest/e2e/test-full-e2e.sh
✅ Files skipped from review due to trivial changes (1)
- test/e2e/test-full-e2e.sh
scripts/e2e-brev-test.sh
Outdated
| cleanup() { | ||
| info "Cleaning up: deleting instance $INSTANCE_NAME..." | ||
| brev delete "$INSTANCE_NAME" 2>/dev/null || warn "Failed to delete instance (may not exist)" | ||
| info "Log saved to $LOG_FILE" |
There was a problem hiding this comment.
Guard cleanup deletion with a “created-by-this-run” flag.
cleanup() always calls brev delete "$INSTANCE_NAME" even if creation never succeeded. A failed login/create path can delete a pre-existing instance with the same name.
🔧 Proposed fix
REMOTE_EXIT_CODE=0
+INSTANCE_CREATED=0
# ── Cleanup trap — always delete the instance ────────────────────
cleanup() {
- info "Cleaning up: deleting instance $INSTANCE_NAME..."
- brev delete "$INSTANCE_NAME" 2>/dev/null || warn "Failed to delete instance (may not exist)"
+ if [ "$INSTANCE_CREATED" -eq 1 ]; then
+ info "Cleaning up: deleting instance $INSTANCE_NAME..."
+ brev delete "$INSTANCE_NAME" 2>/dev/null || warn "Failed to delete instance (may not exist)"
+ else
+ info "Cleanup: no instance created by this run; skipping delete"
+ fi
info "Log saved to $LOG_FILE"
}
trap cleanup EXIT
@@
brev create "$INSTANCE_NAME" --cpu "$BREV_CPU" --detached 2>&1 || fail "Failed to create instance"
+INSTANCE_CREATED=1
info "Instance creation started"Also applies to: 67-68
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/e2e-brev-test.sh` around lines 46 - 49, The cleanup() function
unconditionally calls brev delete "$INSTANCE_NAME", which can remove a
pre-existing instance if creation/login failed; add a boolean flag (e.g.,
INSTANCE_CREATED or CREATED_BY_RUN) that is set to true only after successful
creation in the create/login flow, then modify cleanup() to only call brev
delete when that flag is true; apply the same guard to the other deletion call
referenced around lines 67-68 so deletions only occur for instances created by
this run and not for pre-existing resources.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
test/e2e/brev-e2e.test.js (2)
74-87: Minor: Consider using native Node.js sleep instead of spawning a shell.Using
execSync("sleep ...")works but spawns a shell process for each iteration. A native approach would be more idiomatic, though it would require makingwaitForSshasync.♻️ Alternative using native setTimeout
-function waitForSsh(maxAttempts = 60, intervalMs = 5_000) { +async function waitForSsh(maxAttempts = 60, intervalMs = 5_000) { + const sleep = (ms) => new Promise((r) => setTimeout(r, ms)); for (let i = 1; i <= maxAttempts; i++) { try { ssh("echo ok", { timeout: 10_000 }); return; } catch { if (i === maxAttempts) throw new Error(`SSH not ready after ${maxAttempts} attempts`); if (i % 5 === 0) { try { brev("refresh"); } catch { /* ignore */ } } - execSync(`sleep ${intervalMs / 1000}`); + await sleep(intervalMs); } } }Note: This would require
beforeAllto be async and callawait waitForSsh().🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/brev-e2e.test.js` around lines 74 - 87, The current waitForSsh function blocks by spawning a shell with execSync("sleep ..."); refactor waitForSsh to be async (function waitForSsh(...)) and replace execSync sleep with a non-blocking await (e.g., await new Promise(resolve => setTimeout(resolve, intervalMs))); keep the retry logic, timeouts for ssh("echo ok", { timeout: 10_000 }) and the brev("refresh") calls but await where needed, and update any callers (e.g., beforeAll) to be async and call await waitForSsh(...).
137-143: Instance cleanup may not run on forcible termination.If the test process receives SIGKILL (e.g., CI timeout, OOM),
afterAllwon't execute and the Brev instance will be orphaned. Consider documenting this limitation or adding external cleanup (e.g., a scheduled job to delete stalee2e-pr-*instances).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/e2e/brev-e2e.test.js` around lines 137 - 143, The afterAll cleanup (in test/e2e/brev-e2e.test.js) that calls brev("delete", INSTANCE_NAME) won't run on forcible termination (SIGKILL), so add mitigation: either update the repository docs/README or e2e test README to document this limitation and advise running an external scheduled cleanup, or add an auxiliary maintenance job (e.g., a CI scheduled workflow or cron job) that deletes stale instances matching the e2e prefix (e.g., e2e-pr-*) by invoking the same brev delete logic; ensure the instructions reference the afterAll cleanup and the brev("delete", INSTANCE_NAME) behavior so maintainers know what external cleanup must replicate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/e2e-brev.yaml:
- Around line 128-134: The workflow's "Upload test logs" step is trying to
upload /tmp/brev-e2e-*.log from the runner but the logs live on the remote Brev
VM; add a prior step (e.g., "Download remote logs" or within the SSH teardown
flow that runs before the "Upload test logs" step) that copies the remote log
files (e.g., /tmp/nemoclaw-*-e2e-*.log) from the VM to the runner workspace (for
example via scp or an ssh command that streams files into $GITHUB_WORKSPACE or
/tmp on the runner), then update the Upload test logs step to point to the
downloaded location so artifacts are actually uploaded. Ensure the download step
runs regardless of test failure so logs are available to upload.
In `@test/e2e/brev-e2e.test.js`:
- Around line 54-60: The secret exports in sshWithSecrets build secretPreamble
using raw process.env values (NVIDIA_API_KEY, GITHUB_TOKEN) which breaks the
shell if they contain single quotes; update sshWithSecrets to sanitize/escape
single quotes in those env values before joining (e.g., replace each ' with the
shell-safe sequence '\''), and use the escaped values when constructing
secretPreamble (apply the same escaping to any other exported secret like
NEMOCLAW_* if needed) so the resulting export lines are safe to interpolate into
the remote shell command.
---
Nitpick comments:
In `@test/e2e/brev-e2e.test.js`:
- Around line 74-87: The current waitForSsh function blocks by spawning a shell
with execSync("sleep ..."); refactor waitForSsh to be async (function
waitForSsh(...)) and replace execSync sleep with a non-blocking await (e.g.,
await new Promise(resolve => setTimeout(resolve, intervalMs))); keep the retry
logic, timeouts for ssh("echo ok", { timeout: 10_000 }) and the brev("refresh")
calls but await where needed, and update any callers (e.g., beforeAll) to be
async and call await waitForSsh(...).
- Around line 137-143: The afterAll cleanup (in test/e2e/brev-e2e.test.js) that
calls brev("delete", INSTANCE_NAME) won't run on forcible termination (SIGKILL),
so add mitigation: either update the repository docs/README or e2e test README
to document this limitation and advise running an external scheduled cleanup, or
add an auxiliary maintenance job (e.g., a CI scheduled workflow or cron job)
that deletes stale instances matching the e2e prefix (e.g., e2e-pr-*) by
invoking the same brev delete logic; ensure the instructions reference the
afterAll cleanup and the brev("delete", INSTANCE_NAME) behavior so maintainers
know what external cleanup must replicate.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 98e4d405-4e6e-4fdf-b20e-fe3a8fb1a861
📒 Files selected for processing (3)
.github/workflows/e2e-brev.yamltest/e2e/brev-e2e.test.jsvitest.config.ts
| - name: Upload test logs | ||
| if: failure() | ||
| uses: actions/upload-artifact@v4 | ||
| with: | ||
| name: e2e-brev-logs | ||
| path: /tmp/brev-e2e-*.log | ||
| if-no-files-found: ignore |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if any script creates brev-e2e logs
rg -n "brev-e2e.*\.log" --type sh --type bash
rg -n "/tmp/brev-e2e"Repository: NVIDIA/NemoClaw
Length of output: 87
🏁 Script executed:
#!/bin/bash
# Find the brev-e2e test file
fd -e js -e ts | grep -i "brev-e2e"Repository: NVIDIA/NemoClaw
Length of output: 84
🏁 Script executed:
#!/bin/bash
# Search for patterns that create brev-e2e logs more broadly
rg "brev-e2e.*\.log" -i
rg "/tmp/brev-e2e" -iRepository: NVIDIA/NemoClaw
Length of output: 41
🏁 Script executed:
#!/bin/bash
# Check the e2e-brev.yaml workflow file content
cat -n .github/workflows/e2e-brev.yaml | head -150Repository: NVIDIA/NemoClaw
Length of output: 5374
🏁 Script executed:
#!/bin/bash
# Read the test file to see if it creates log files
wc -l test/e2e/brev-e2e.test.jsRepository: NVIDIA/NemoClaw
Length of output: 88
🏁 Script executed:
#!/bin/bash
# Read first part of the test file
head -100 test/e2e/brev-e2e.test.jsRepository: NVIDIA/NemoClaw
Length of output: 3346
🏁 Script executed:
#!/bin/bash
# Check if there's a vitest config file
fd -e config.ts -e config.js -e vitest.config.ts -e vitest.config.jsRepository: NVIDIA/NemoClaw
Length of output: 96
🏁 Script executed:
#!/bin/bash
# Read the vitest config file
cat vitest.config.tsRepository: NVIDIA/NemoClaw
Length of output: 904
🏁 Script executed:
#!/bin/bash
# Read the rest of the test file
tail -65 test/e2e/brev-e2e.test.jsRepository: NVIDIA/NemoClaw
Length of output: 2173
🏁 Script executed:
#!/bin/bash
# Check if remote test scripts reference log files
rg "\.log" test/e2e/Repository: NVIDIA/NemoClaw
Length of output: 128
🏁 Script executed:
#!/bin/bash
# List all files in test/e2e directory
ls -la test/e2e/Repository: NVIDIA/NemoClaw
Length of output: 449
Upload path won't find any files since logs are created on the remote VM, not the runner.
The test runs on an ephemeral Brev instance via SSH and creates logs on the remote VM (e.g., /tmp/nemoclaw-e2e-install.log), but the workflow step attempts to upload from the runner's /tmp/brev-e2e-*.log. The if-no-files-found: ignore prevents failure, but no logs are uploaded when tests fail, making debugging harder.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.github/workflows/e2e-brev.yaml around lines 128 - 134, The workflow's
"Upload test logs" step is trying to upload /tmp/brev-e2e-*.log from the runner
but the logs live on the remote Brev VM; add a prior step (e.g., "Download
remote logs" or within the SSH teardown flow that runs before the "Upload test
logs" step) that copies the remote log files (e.g., /tmp/nemoclaw-*-e2e-*.log)
from the VM to the runner workspace (for example via scp or an ssh command that
streams files into $GITHUB_WORKSPACE or /tmp on the runner), then update the
Upload test logs step to point to the downloaded location so artifacts are
actually uploaded. Ensure the download step runs regardless of test failure so
logs are available to upload.
1fb8d7d to
89b90da
Compare
|
Addressing CodeRabbit feedback on secret exposure (#2982050976, #2982050993):
// Secrets piped via stdin, not interpolated into the command
return execSync(
`ssh ... '${INSTANCE_NAME}' 'eval "$(cat)" && ${cmd}'`,
{ input: secretPreamble, ... },
); |
89b90da to
3b7c530
Compare
c6f92d5 to
5f502d6
Compare
|
|
||
| - name: Install Brev CLI | ||
| run: | | ||
| # Pin to v0.6.310 — v0.6.322 removed --cpu flag and defaults to GPU instances |
There was a problem hiding this comment.
--cpu flag was deprecated. The new way to deploy a CPU is as follows brev create --type n2-standard-4 test-cpu
cv
left a comment
There was a problem hiding this comment.
Code looks good — secret handling via stdin is solid, vitest migration is clean, CI is green. The Brev --cpu deprecation is worth a follow-up but not a blocker.
Add a vitest-based E2E test suite that provisions an ephemeral Brev CPU instance, bootstraps it with NemoClaw (Node.js, Docker, OpenShell, gateway, sandbox), runs tests remotely, and optionally keeps the instance alive for SSH debugging. New files: - test/e2e/brev-e2e.test.js: vitest suite — create, bootstrap, test, cleanup - .github/workflows/e2e-brev.yaml: workflow with dispatch inputs, PR reporting, and keep_alive option for SSH debugging Changes: - vitest.config.ts: added e2e-brev project, excluded from default runs - test/e2e/test-full-e2e.sh: pass --yes to destroy, shfmt formatting - .gitignore: added specs/ Key design decisions: - Ephemeral instances — no persistent state, no idle cost - CPU-only (--cpu 4x16) — OpenShell gateway works without GPU - Brev CLI pinned to v0.6.310 (v0.6.322 removed --cpu flag) - Secrets passed via stdin to avoid exposure in process listings - keep_alive defaults to true — instance stays up for SSH debugging - describe.runIf skips tests when secrets unavailable (normal CI) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
d6b3058 to
77f46f2
Compare
…nitization Adds two new Brev E2E test suites targeting the vulnerabilities fixed by PR NVIDIA#119 (Telegram bridge command injection) and PR NVIDIA#156 (credential exposure in migration snapshots + blueprint digest bypass). Test suites: - test-telegram-injection.sh: 8 tests covering command substitution, backtick injection, quote-breakout, parameter expansion, process table leaks, and SANDBOX_NAME validation - test-credential-sanitization.sh: 13 tests covering auth-profiles.json deletion, credential field stripping, non-credential preservation, symlink safety, blueprint digest verification, and pattern-based field detection These tests are expected to FAIL on main (unfixed code) and PASS once PR NVIDIA#119 and NVIDIA#156 are merged. Refs: NVIDIA#118, NVIDIA#119, NVIDIA#156, NVIDIA#813
Adds specification for a warm-pool model that eliminates the 40+ min cold-start Docker build time from Brev E2E test runs. Key design: - Pre-warmed Brev instances using NemoClaw launchable - Naming convention (e2e-warm-*) as state — no external tracking - PR comment trigger (/test-brev <suites>) for maintainers - Parallel test execution across pool instances - Daily health check + 24h instance cycling - Fallback to ephemeral mode if pool is empty 6 phases: pool script → warmer workflow → test runner → PR trigger → run security tests → cleanup Also includes the original NVIDIA#813 hardware E2E spec for reference. Refs: NVIDIA#813, NVIDIA#118, NVIDIA#119, NVIDIA#156
Add a vitest-based E2E test suite that provisions an ephemeral Brev CPU instance, bootstraps it with NemoClaw (Node.js, Docker, OpenShell, gateway, sandbox), runs tests remotely, and optionally keeps the instance alive for SSH debugging. New files: - test/e2e/brev-e2e.test.js: vitest suite — create, bootstrap, test, cleanup - .github/workflows/e2e-brev.yaml: workflow with dispatch inputs, PR reporting, and keep_alive option for SSH debugging Changes: - vitest.config.ts: added e2e-brev project, excluded from default runs - test/e2e/test-full-e2e.sh: pass --yes to destroy, shfmt formatting - .gitignore: added specs/ Key design decisions: - Ephemeral instances — no persistent state, no idle cost - CPU-only (--cpu 4x16) — OpenShell gateway works without GPU - Brev CLI pinned to v0.6.310 (v0.6.322 removed --cpu flag) - Secrets passed via stdin to avoid exposure in process listings - keep_alive defaults to true — instance stays up for SSH debugging - describe.runIf skips tests when secrets unavailable (normal CI) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…nitization Adds two new Brev E2E test suites targeting the vulnerabilities fixed by PR NVIDIA#119 (Telegram bridge command injection) and PR NVIDIA#156 (credential exposure in migration snapshots + blueprint digest bypass). Test suites: - test-telegram-injection.sh: 8 tests covering command substitution, backtick injection, quote-breakout, parameter expansion, process table leaks, and SANDBOX_NAME validation - test-credential-sanitization.sh: 13 tests covering auth-profiles.json deletion, credential field stripping, non-credential preservation, symlink safety, blueprint digest verification, and pattern-based field detection These tests are expected to FAIL on main (unfixed code) and PASS once PR NVIDIA#119 and NVIDIA#156 are merged. Refs: NVIDIA#118, NVIDIA#119, NVIDIA#156, NVIDIA#813
…al sanitization (#1092) * test(security): add E2E tests for command injection and credential sanitization Adds two new Brev E2E test suites targeting the vulnerabilities fixed by PR #119 (Telegram bridge command injection) and PR #156 (credential exposure in migration snapshots + blueprint digest bypass). Test suites: - test-telegram-injection.sh: 8 tests covering command substitution, backtick injection, quote-breakout, parameter expansion, process table leaks, and SANDBOX_NAME validation - test-credential-sanitization.sh: 13 tests covering auth-profiles.json deletion, credential field stripping, non-credential preservation, symlink safety, blueprint digest verification, and pattern-based field detection These tests are expected to FAIL on main (unfixed code) and PASS once PR #119 and #156 are merged. Refs: #118, #119, #156, #813 * ci: temporarily disable repo guard for fork testing * ci: bump bootstrap timeout, skip vLLM on CPU E2E runs - Add SKIP_VLLM=1 support to brev-setup.sh - Use SKIP_VLLM=1 in brev-e2e.test.js bootstrap - Bump beforeAll timeout to 30 min for CPU instances - Bump workflow timeout to 60 min for 3 test suites * ci: bump bootstrap timeout to 40 min for sandbox image build * ci: bump Brev instance to 8x32 for faster Docker builds * ci: add real-time progress streaming for E2E bootstrap and tests - Stream SSH output to CI log during bootstrap (no more silence) - Add timestamps to brev-setup.sh and setup.sh info/warn/fail messages - Add background progress reporter during sandbox Docker build (heartbeat every 30s showing elapsed time, current Docker step, and last log line) - Stream test script output to CI log via tee + capture for assertions - Filter potential secrets from progress heartbeat output * ci: use NemoClaw launchable for E2E bootstrap Replace bare 'brev create' + brev-setup.sh with 'brev start' using the OpenShell-Community launch-nemoclaw.sh setup script. This installs Docker, OpenShell CLI, and Node.js via the launchable's proven path, then runs 'nemoclaw onboard --non-interactive' to build the sandbox (testing whether this path is faster than our manual setup.sh). Changes: - Default CPU back to 4x16 (8x32 didn't help — bottleneck was I/O) - Launchable path: brev start + setup-script URL, poll for completion, rsync PR branch, npm ci, nemoclaw onboard - Legacy path preserved (USE_LAUNCHABLE=0) - Timestamped logging throughout for timing comparison - New use_launchable workflow input (default: true) * fix: prevent openshell sandbox create from hanging in non-interactive mode openshell sandbox create without a command defaults to opening an interactive shell inside the sandbox. In CI (non-interactive SSH), this hangs forever — the sandbox goes Ready but the command never returns. The [?2004h] terminal escape codes in CI logs were bash enabling bracketed paste mode, waiting for input. Add --no-tty -- true so the command exits immediately after the sandbox is created and Ready. * fix: source nvm in non-interactive SSH for launchable path The launchable setup script installs Node.js via nvm, which sets up PATH in ~/.nvm/nvm.sh. Non-interactive SSH doesn't source .bashrc, so npm/node commands fail with 'command not found'. Source nvm.sh before running npm in the launchable path and runRemoteTest. * fix: setup.sh respects NEMOCLAW_SANDBOX_NAME env var setup.sh defaulted to 'nemoclaw' ignoring the NEMOCLAW_SANDBOX_NAME env var set by the CI test harness (e2e-test). Now uses $1 > $NEMOCLAW_SANDBOX_NAME > nemoclaw. * ci: bump full E2E test timeout to 15 min for install + sandbox build * ci: don't run full E2E alongside security tests (it destroys the sandbox) The full E2E test runs install.sh --non-interactive which destroys and rebuilds the sandbox. When TEST_SUITE=all, this kills the sandbox that beforeAll created, causing credential-sanitization and telegram-injection to fail with 'sandbox not running'. Only run full E2E when TEST_SUITE=full. * ci: pre-build base image locally when GHCR image unavailable On forks or before the first base-image workflow run, the GHCR base image (ghcr.io/nvidia/nemoclaw/sandbox-base:latest) doesn't exist. This causes the Dockerfile's FROM to fail. Now setup.sh checks for the base image and builds Dockerfile.base locally if needed. On subsequent builds, Docker layer cache makes this near-instant. Once the GHCR base image is available, this becomes a no-op (docker pull succeeds and the local build is skipped). * ci: install nemoclaw CLI after bootstrap in non-launchable path brev-setup.sh creates the sandbox but doesn't install the host-side nemoclaw CLI that test scripts need for 'nemoclaw <name> status'. Add npm install + build + link step after bootstrap. * fix: use npm_config_prefix for nemoclaw CLI install so it lands on PATH * fix: npm link from repo root where bin.nemoclaw is defined * fix(ci): register sandbox in nemoclaw registry after setup.sh bootstrap setup.sh creates the sandbox via openshell directly but never writes ~/.nemoclaw/sandboxes.json. The security test scripts check `nemoclaw <name> status` which reads the registry, causing all E2E runs to fail with 'Sandbox e2e-test not running'. Write the registry entry after nemoclaw CLI install so the test scripts can find the sandbox. * style: shfmt formatting fix in setup.sh * fix(test): exclude policy presets from C7 secret pattern scan C7 greps for 'npm_' inside the sandbox and false-positives on nemoclaw-blueprint/policies/presets/npm.yaml which contains rule names like 'npm_yarn', not actual credentials. Filter out /policies/ paths from all three pattern checks. * docs(ci): add test suite descriptions to e2e-brev workflow header Document what each test_suite option runs so maintainers can make an informed choice from the Actions UI without reading the test scripts. * ci: re-enable repo guard for e2e-brev workflow Re-enable the github.repository check so the workflow only runs on NVIDIA/NemoClaw, not on forks. * fix(test): update setup-sandbox-name test for NEMOCLAW_SANDBOX_NAME env var setup.sh now uses ${1:-${NEMOCLAW_SANDBOX_NAME:-nemoclaw}} instead of ${1:-nemoclaw}. Update the test to match and add coverage for the env var fallback path. * fix(lint): add shellcheck directives for injection test payloads and fix stdio type * fix(lint): suppress SC2034 for status_output in credential sanitization test * fix: address CodeRabbit review — timeout, pipefail, fail-closed probes, shell injection in test - Bump e2e-brev workflow timeout-minutes from 60 to 90 - Add fail-fast when launchable setup exceeds 40-min wait - Add pipefail to remote pipeline commands in runRemoteTest and npm ci - Fix backtick shell injection in validateName test loop (use process.argv) - Make sandbox_exec fail closed with __PROBE_FAILED__ sentinel - Add probe failure checks in C6/C7 sandbox assertions --------- Co-authored-by: Carlos Villela <cvillela@nvidia.com>
How to use
After merge, any maintainer can trigger a full-stack E2E test against any branch:
trueSet keep_alive to
falseto auto-delete the instance after tests complete.SSH access setup
To SSH into a kept-alive instance, you need to be in the NemoClaw Brev org:
brew install brev(or see docs)brev loginbrev refresh && ssh <instance-name>Summary
e2e-brev) triggered viaworkflow_dispatchorworkflow_callMotivation
Security-critical PRs (e.g., #156) need testing against the real OpenShell + sandbox stack, which the existing Docker sandbox E2E and nightly cloud E2E cannot provide.
Files changed
test/e2e/brev-e2e.test.js.github/workflows/e2e-brev.yamlvitest.config.tse2e-brevproject, excluded E2E from CLI test projecttest/e2e/test-full-e2e.sh--yesto destroy, shfmt formatting.gitignorespecs/Required secrets (must be configured before first run)
BREV_API_TOKENNote: Brev refresh tokens expire in ~24 hours. The secret needs periodic refresh.
NVIDIA_API_KEYAlready exists. Must start with
nvapi-.Validated end-to-end
All 6 phases passed on an ephemeral Brev instance triggered from GitHub Actions (fork run):
Known limitations
BREV_API_TOKENneeds periodic refreshbrev-setup.shwastes ~3 min (separate fix)--cpuflagNext steps (after this PR merges)
/test-brevPR comment trigger —issue_commentworkflow so maintainers can type/test-brevon a PR instead of using the Actions UI. Requires the workflow file onmainfirst.test/e2e/test-credential-sanitization.shexercising realcreateSnapshotBundle()frommigration-state.ts. The dropdown option is wired but no script exists yet.nightly-e2e.yamlcallinge2e-brev.yamlviaworkflow_call.brev-setup.shGPU detection.🤖 Generated with Claude Code